Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up fixed-sized array iteration #1050

Merged
merged 4 commits into from
Mar 2, 2023
Merged

Speed up fixed-sized array iteration #1050

merged 4 commits into from
Mar 2, 2023

Conversation

jleibs
Copy link
Member

@jleibs jleibs commented Feb 2, 2023

This optimizes iteration of fixed-sized arrays, but restricts them to Primitives, which seems like a reasonable trade-off for now. We can revisit if we find a need for non-primitive fixed-sized arrays at some point in the future.

I introduced a new benchmark in re_query for Vecs as separate from points. Specifically I use 100k vec arrays so make the issue more readily apparent. For arrays of 1k the other query overhead dominates.

New vec bench

Before:

arrow_batch_vecs/insert time:   [22.727 µs 22.776 µs 22.835 µs]
                        thrpt:  [43.793 Gelem/s 43.906 Gelem/s 44.000 Gelem/s]


arrow_batch_vecs/query  time:   [2.4701 ms 2.4797 ms 2.4912 ms]
                        thrpt:  [40.142 Melem/s 40.328 Melem/s 40.484 Melem/s]

After:

arrow_batch_vecs/insert time:   [23.079 µs 23.137 µs 23.205 µs]
                        thrpt:  [43.094 Gelem/s 43.220 Gelem/s 43.329 Gelem/s]
                 change:
                        time:   [+0.5238% +1.5090% +2.4224%] (p = 0.00 < 0.05)
                        thrpt:  [-2.3651% -1.4865% -0.5211%]

arrow_batch_vecs/query  time:   [161.09 µs 161.69 µs 162.38 µs]
                        thrpt:  [615.85 Melem/s 618.45 Melem/s 620.78 Melem/s]
                 change:
                        time:   [-93.518% -93.482% -93.448%] (p = 0.00 < 0.05)
                        thrpt:  [+1426.3% +1434.1% +1442.7%]

Checklist

  • I have read and agree to Contributor Guide and the Code of Conduct
  • I've included a screenshot or gif (if applicable)
  • I've added a line to CHANGELOG.md (if this is a big enough change to warrant it)

}

let out: [T; SIZE] =
array_init::array_init(|i: usize| self.values.value(self.offset * SIZE + i));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we already have this crate in the deps I guess? We should add a note and track the stabilization of the stdlib feature we actually would use instead:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did -- see the original implementation of FixedSizeArrayField::arrow_deserialize below.

@jleibs jleibs changed the base branch from cmc/feed_spatial_perf to main March 1, 2023 20:16
@jleibs jleibs added 📉 performance Optimization, memory use, etc 🏹 arrow concerning arrow labels Mar 1, 2023
@jleibs jleibs marked this pull request as ready for review March 1, 2023 20:56
// Iteration should only happen via iter_from_array_ref.
// This is a quirk of the way the traits work in arrow2_convert.
unsafe {
do_not_call_into_iter();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this trick :)

@emilk emilk merged commit 63c6983 into main Mar 2, 2023
@emilk emilk deleted the jleibs/fast_fixed_size branch March 2, 2023 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏹 arrow concerning arrow 📉 performance Optimization, memory use, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants